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Abstract — A statistical frameworlt is introduced for a broad 
class of problems involving synchronization or registration of 
data across a sensor network in the presence of noise. This 
framework enables an estimation-theoretic approach to the 
design and characterization of synchronization algorithms. The 
Fisher information is expressed in terms of the distribution of 
the measurement noise and standard mathematical descriptors 
of the network's graph structure for several important cases. 
This leads to maximum likelihood and approximate maximum- 
likelihood registration algorithms and also to distributed iterative 
algorithms that, when they converge, attain statistically optimal 
solutions. The relationship between optimal estimation in this 
setting and Kirchhoff 's laws is also elucidated. 



I. Introduction 

Registration of data across a network is an ubiquitous 
problem in distributed sensing. Over more than three decades, 
much effort has been expended on development of algorithms 
to provide time synchronization across a distributed network; 
e.g., fTl-fSl. Synchronization of this kind is important for 
distributed parallel processing as well as data fusion across 
a sensor network. It is typically the case that the network 
is not complete; i.e., each node does not communicate with 
every other node. A large fraction of algorithms described 
in the literature produce algorithms to minimize an error or 
objective function based on least squares, often within power 
or other resource constraints. Leaving aside the latter issue, 
the problem in this setting is to assign a clock adjustment 
to every node based on knowledge of the clock differences, 
generally noisy, between some pairs of nodes in the network. 
Even if clock difference measurements are available for every 
pair of nodes in the network, the presence of noise still raises 
consistency considerations; e.g., the true offsets must sum to 
zero around any closed cycle. 

The domain of practical network synchronization problems 
is by no means limited to clock offsets, nor is the natural 
measurement space restricted to the real line. Individual nodes 
may possess multiple data to be registered cross the network, 
and the noise affecting such vector data may be correlated 
across its components. Further, the natural measurement space 
is often not a linear space. In phase synchronization, for 
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example, typical data could be measurements of the phase 
differences between local oscillators at the nodes. In this case, 
the natural measurement space is the circle T = M/ 27rZ rather 
then the real line M. If several local oscillators are involved, 
measurements might lie on the torus T". Another important 
practical example where the measurement space is a nonlinear 
multi-dimensional manifold is registration of local coordinate 
systems, for which the natural setting is the special orthogonal 
group 50(3). Even in the context of clocks, if both offset and 
clock speed are adjustable locally the offsets are elements of 
the affine group A. These examples illustrate that practical 
problems can entail data on Lie groups that are compact (e.g., 
T or 50(3)), non-compact (M"), abelian (M", T), or non- 
abelian (50(3) or A). 

It is common to represent networks by graphical models. 
In this setting, the network nodes that provide data to be 
registered or synchronized are represented by vertices labeled 
with their associated parameters, such as local clock time or 
local coordinate system. Each pair of vertices corresponding 
to a pair of nodes that are in direct communication are joined 
by an edge. Information is shared between vertices along such 
edges, each of which is labeled by a noisy measurement of the 
difference between the parameters at the end vertices of the 
edge. The notion of difference between two parameter values 
depends on the algebraic structure of the parameter space. In 
M", for example, it is defined by subtraction of vectors. In 
other spaces, which are assumed to be Lie groups, difference 
is defined in terms of the group operation. The estimation 
problem on which this paper focuses arises precisely because 
the difference values that label the edges are corrupted by noise 
in a sense that will be made precise later in the paper As a 
consequence of this corruption, the edge labels in any graph 
with cycles will generally be inconsistent; i.e., the edge labels 
around closed cycles will not sum to zero, even though the 
true difference values must do so. For a connected graph, the 
desired synchronization should provide a set of consistent edge 
labels that can be used to determine a unique offset value to 
register any pair of vertices in the graph. If one vertex label 
is known, this is equivalent to assigning labels to all other 
vertices consistently throughout the graph. 

The first goal of this paper is to frame a class of net- 
work synchronization problems encompassed by the graphical 
model just described in terms of statistical estimation theory. 
The second goal is to derive and characterize maximum- 
likelihood estimators or approximations thereof for a signif- 
icant subclass class of these problems, together with local 
algorithms that realize these estimators. Specifically, these 
estimation problems entail estimation of the vertex labels 
(parameters) from the edge labels (noisy data). This paper fo- 
cuses upon two cases in which the noise models are classical: 
Gaussian noise on R'^ and von Mises distributed noise on T. 
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Between them, these cases encompass most of the elements 
encountered in the situation where the measurements and noise 
are on abehan Lie groups, which is the most general setting 
addressed in detail in this paper The non-abelian situation, 
which presents additional mathematical challenges, will be 
addressed in a sequel. 

In each form of the estimation problem treated here, the 
Fisher information and maximum likelihood estimator are 
derived in a closed forms that depend both on the distribution 
of the noise and on the structure of the graph in intuitively 
appealing ways. In each of these cases, it is observed that 
the maximum likelihood estimator is non-local in the sense 
that the offsets for a node relative to its neighbors cannot be 
estimated at that node using only information obtained from its 
neighbors. Nevertheless, it is possible to find iterative (gradient 
descent) algorithms that are local and do converge to the ML 
estimator when they converge, which happens very frequently 
in empirical tests. 

Interesting problems associated with local maxima of the 
likelihood function that arise in the setting of compact Lie 
groups are discussed in connection with the circle case treated 
here. The value of the Fisher information and functions thereof 
(e.g., its determinant) in designing networks that support 
accurate registration is also discussed. 

Structurally, Section |ll] begins with a synopsis of the essen- 
tial concepts of graph theory and the associated notation that 
will be used throughout the paper. Some relevant introductory 
material on Lie groups is also summarized in this section, and 
the mathematical framework for discussion of the abelian Lie 
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group is set forth. Section III treats the case of Gaussian noise 
on and Section 



subsequently describes and analyzes the 
performance of local algorithms for this case. Section [Vjcovers 
the case of von Mises noise on T, followed by description and 
performance analysis for local algorithms for this situation and 
a discussion of critical points of the likelihood function. The 



general abelian Lie group case is presented in Section VI 



where it is shown to follow essentially the same pattern as the 
special cases presented earlier Finally, some preliminary re- 
marks on the important non-abelian Lie group setting appears 



in Section VII together with some concluding remarks. 



II. Graph Theory Preliminaries 

The aspects of elementary graph theory needed in the re- 
mainder of this paper are covered in many standard references, 
such as |6|. In what follows, a graph F will be assumed 
to have a finite vertex set V{T). The collection of edges 
in F will be denoted by E{r), where the edge e = {u, v) 
joins vertices u and v . If the graph is directed then edge 
{u, v) starts at u and ends at v. If the graph is not directed 
then {u,v) = {v,u). Unless otherwise noted, graphs in this 
paper will be directed, and the values labeling the edges will 
correspond to the difference between the label on the vertex 
at the end of the edge and the label on the vertex at the start 
of the edge. At several places in the paper, however, the graph 
orientation is irrelevant. 

In the following definitions of spaces of functions on V{r) 
and £'(F), the functions are assumed to be real-valued, as 



in the clock synchronization example. Later these definitions 
will be extended to cover other possible parameter spaces. The 
vertex space Co(F) of a graph F is the real vector space of 
functions V{T) M; elements of Co(F) are vectors of real 
numbers indexed by the vertices. Similarly the edge space 
Ci(F) is the real vector space of functions E{r) — > R. The 
number of vertices in F will be denoted by |T^(F)| — n and 
the number of edges by |i?(F)| = m. It will be convenient 
to fix an ordering on the sets V and E in order to construct 
bases for Ca(T) and Ci{T). Denoting V"(r) = {ui, . . . , t;„}, a 
basis for Co(F) is obtained by defining functions Vi, - ■ ■ , v„ 
according to 



1, 



(1) 



SO that any element of Co{T) can be written 

n 



X 



Co(F) is an inner product space with inner product {■,-)ca 
such that {vi,Vj)^^ — Sij, so that t^i,--- ,Vn becomes an 
orthonormal basis of the real inner product space. The linear 
map A : Co{T) Cq(T) defined by 



Avf 



where Vj ^ ve indicates that there is an edge connecting 
Vj and vg, is called the (undirected) adjacency map. The 
corresponding matrix in the Vf, basis is called the adjacency 
matrix. 

The degree dy of a vertex v is the number of edges either 
starting or ending at v, and the linear map N : Co(F) — >^ 
Co(F) defined by 

is called the degree map. The corresponding matrix in the 
Vj basis is called the degree matrix and is diagonal with 
the degrees of the vertices of F on its main diagonal. The 
Kirchhoff map (unnormalized Laplacian) of F is defined by 

L^N-A : Co(F) ^Ci(F). 

L is positive semidefinite and the dimension of its zero 
eigenspace (null space) is the number of connected compo- 
nents of F. 

Indexing the edges of F as E{r) = {ei, • • • , em}, a basis 
for Ci (F) can be constructed in similar fashion to the one for 
Co(F). Specifically, define functions ei, . . . , e„ by 



Then any element of Ci(F) can be written 



(2) 



Defining an inner product (•,-)ci such that {ei,ej)c-i = Sij 
makes Ci(F) into an inner product space. Linear source and 
target maps, respectively s,t : Ci(F) — Co(F) are defined by 

s{ej) = u and t{ej) — v 
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where ej ~ {u,v). Finally the (directed) incidence (or bound- 
ary) map of r is L> : Ci(r) Cq{T), where D ^ t - s. 
Certain concepts will be needed from homology theory, and 
will be introduced informally as required. The map D is, in 
homological terms, a boundary operator, which applied to any 
edge e — (u, v) gives 

D{e) = t{e) — s{e) = v — u. 

The Kirchhoff map can be written in terms of the incidence 
map as L = DD^, where the adjoint map : Co(r) — > 
Ci(r) is the coboundary operator defined by 

ej:t(ej)=v ej:s{ej)=v 



The cycle space Z{T) is defined as follows. A cycle is 
a closed path in F; that is, a sequence of vertices £ = 
V1V2V3 . . .Vq inT where Vi is adjacent to Vi^i for i = 1, . . .q 
and Vq is adjacent to vi. The corresponding element of Z{r) 
is a function G Ci(F) given by 

{1 if Cj E £ and Cj is oriented as £ 
— 1 if G £ and ej is oriented opposite to £ 
otherwise 

(3) 

Z{T) is the linear subspace of Ci(F) spanned by the zsi as £ 
runs over all cycles in F. The cycle space is exactly the kernel 
of D; that is, for all z e Z(F), 

Dz = 0, (4) 

and every z e Ci(F) satisfying Q is a linear combination of 
zq. This condition implies that, for z e Z{T), the oriented 
sum of the values on the set of edges meeting at any of the 
vertices is zero; i.e., 

ej:t{ej)—v ej:s{ej)—v 

for all V E V^(F). This, of course, is a statement of Kirchhoff's 
current law. For a graph with k connected components the 
dimension of ZiT) is the first Betti number of the graph; i.e., 
dim Z{T) = m — n + k. 

A second subspace of Ci(F) that arises in the development 
to follow is the cocycle space (cut set) J7(F), an element of 
which is defined by fixing a partition of the vertex set V^(F) 
into two disjoint sets; i.e., V^(F) = Vi U V2- With respect to 
this partition, define a vector wqj E U (F) to be 

{1 if Cj joins Vi to V2 
— 1 if Cj joins V2 to Vi 
otherwise 

C7(F) is the linear subspace of Ci(F) spanned by the wtp as 
*P runs over all partitions of V^(F). 

The cocycle space is exactly the orthogonal complement of 
Z{T) in Ci(F). Thus, for any z E Z{T) and any w E U{T), 

{z,u;)c,^0 (6) 



This implies that every u) = J^j'^j^j ^ U{T) satisfies 

for all cycles £, where aj = if ej is oriented as £ and 
aj = 1 if Cj is oriented opposite to £. This is Kirchhoff's 
voltage law. Furthermore, every vector uj E Ci(F) can be 
uniquely decomposed as 

x = uj + z, UJ E U{T) and z E Z{T) 

In other words, Ci(F) = Z(T) © U(T). It is customary in 
the mathematical literature (e.g., ||7]) not to choose a basis to 
identify the Ci{T) {i = 0,1) with their duals and to regard 
boundary and coboundary maps as being on different spaces. 
It is convenient here to identify them. 

The cocycle space [/(F) is also the image of Co(F) under 
the coboundary operator, U{T) — Im(Z?^). The kernel of 
is the space of locally constant functions on V{T); 
i.e., functions that are constant on the vertices of connected 
components. In the development to follow, there is no loss of 
generality in working on different connected components sep- 
arately. So from this point F will be assumed to be connected, 
and hence the kernel of is span{l} where 1 denotes 
the unit constant function on V(r). With this connectedness 
assumption, every x E Cq{T) can be decomposed uniquely as 

X = Duj + al, (7) 

where uj E U{T) and a G M. Formally, this decomposition is 
orthogonal since the kernel of is orthogonal to the image 
of D. 

A spanning tree 5 in F is a graph with V{S) = V{r) such 
that every pair of vertices is joined by exactly one path in S. 
Equivalently, is a maximal subtree of F. Kirchhoff's matrix 
tree theorem implies that the number of spanning trees i(F) 
in F is equal to the absolute value of any cofactor of £(F); 
i.e., t{r) is the modulus of the product of the n — 1 largest 
eigenvalues of L{T). 

In graph theory, one often encounters maps W : Ci(F) -> 
Ci(F) that describe some weighting on the edges of F. In the 
standard basis, such a map has diagonal matrix, and a weighted 
Laplacian can be defined by Z = DWD^ . The matrix tree 
theorem can be generalized to state that the absolute value of 
any cofactor of L is equal to 

E n ^(«) 

S eeS 

where the sum extends over all spanning trees of F. 

III. Gaussian Noise ON M'^ 

This section provides precise formulations of estimation 
problems that arise in connection with registration on networks 
in situations where the parameter values at each node are 
real numbers or vectors in M'*. The measurements of dif- 
ferences between parameter values at communicating nodes 
are corrupted by zero-mean additive Gaussian noise. The 
one-dimensional problem with independent noise on each 
measurement is treated first. The value of explicit expressions 
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for the Fisher information, its determinant, and maximum- 
likelihood estimators obtained for this case in network design 
problems is discussed. This is followed by treatment of the 
multi-dimensional problem, in which both correlated and 
independent noise are considered. 




Fig. 1. Each edge e of a directed graph F is labeled with a value that is 
the difference uje of labels at its boundary vertices plus a noise value Se ■ 



A. One-dimensional Gaussian problem 

The situation in which the parameter space is the real 
line and the differences between communicating nodes are 
corrupted by zero-mean additive Gaussian noise is developed 
here in detail because it illustrates the approach that will be 
used in the more complicated cases that follow. In this setting, 
the data vector r e Ci(r), is a sum of a vector of true 
difference values and noise; i.e., 

r = u} + e (8) 

where r and e are in Ci{T). Because the true difference 
values must sum to zero around any cycle, us E U{r). For 
the moment, assume that e is jointly normal with covariance 
matrix cr^I; i.e., that random variables Si are independent 
and identically distributed (iid) with variance cr^. With this 
assumption, the conditional probability density function for r 
given u) is then 



p{r\u}) 



n 



1 „ 



(27ra2)-l^(ni/2 



and the log-likeUhood function is thus 

£(r|a;) = — '^llci + constant 

The ML estimator minimizes ||r — ci;||ci, and is hence given 
by 

Cj = Tlur (9) 

where liu denotes orthogonal projection into U{T) with 
respect to the inner product on Ci. The residual r — a> then 
resides in the cycle space Z{V). This result provides a useful 
characterization of the ML estimator: the estimate satisfies 
Kirchhoff's voltage law and the residual satisfies Kirchhoff's 
current law. 



To explicitly compute (|9]l, it is desirable to parametrize the 
space C/(r), whose definition (j6|l does not well elucidate the 
nature of its elements. Two basic parameterizations are useful. 
In the first, a particular vertex is chosen as a reference. Then 
(|7]i implies that the value of u; € t/(r) is determined by the 
relative offsets of the other |V^(r)| — 1 = n— 1 vertices, denoted 
by W . The second parameterization of U{T) is obtained by 
choosing spanning tree S € E{r) of F and noting that if us 
is known on S, then all n — 1 offsets relative to a reference 
vertex can be determined by following the tree. 

In the first of these cases, the basis given above for span(VF) 
is chosen and x e span(VF) may be expressed as 

n-l 

Alternatively, for the fixed spanning tree S, one may write 
1/^5 = span(5') as 



where the p{j) label the n — 1 edges comprising the spanning 
tree S. 

These representations enable the definition of the (n— 1) x m 
matrix Dyy- and the (n— 1) x (n— 1) matrix Dws with entries 

[Dwh = {v,,Dej)ca 

[Dws] is the matrix of the restriction of D to span(S'). With 
these definitions, 

(10) 



= D^rX 



and 



1/ = D 



ws 



X. 



Dws is invertible; in fact |8, p. 101], 
det Dws — ±1- 



(11) 



(12) 



v = D 



ws 



Taking inverses in (111 yields 

x = {Di:sy^ 

where [Ps]ij = ^p{i).j is the matrix of the orthogonal projec- 
tion onto S. The matrix 

Lw = DwDw 

is the Laplacian (Kirchhoff) matrix with the row and column 
corresponding to w„ removed. 

With this notation, the maximum-likelihood estimates for x 
and V are then 



X 



LwDwr 
DwsLw^wr 



(13) 



and the estimate of u) is 



Dw^wDwr 
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The Fisher information matrix for estimation of the offsets Xi 



IS 



-E{V|logp(r|a;)} 



Equation ( [TO] l implies that 



dujj 

dx-j 



= D 



(14) 



and so 



1 



,DwD 



w 



Hence is proportional to the matrix of the Kirchhoff 
map in the standard basis, but with the row and column 
corresponding to the reference vertex removed. In a similar 
way the Fisher information matrix for estimation of the Vj is 

Note that det Lw is a minor of the Kirchhoff matrix L, 
and so by the Kirchhoff matrix tree theorem it is equal to the 
number of spanning trees t(r) of F. By ( [T2] i, the determinant 
of the Fisher information is thus 



detF 



w 



= fT-2("-i)t(r) 



(15) 



The best possible situation occurs when the pairwise dif- 
ference between all nodes is measured. In this case, F is a 
complete graph and the number of spanning trees is known 
to be t(r) = n'""^^ In this situation, the "average" Fisher 
information per node is 



(deti^'^)l/("-l) = i_„("-2)/(n-l) 



as n 



Both of the ML estimators x and a> are unbiased, and as a 
result, since the covariance matrix of r is ct^I, 



Cs, = E{{x - x}{x - x)^} 



1 



and its determinant is 



dctCri. = 



(16) 



as anticipated from ([TSj. The covariance matrix of the estima- 
tor di is 



C<i = E{(u;-cl>)(u;-cl>)T} 

u 



since Pu = D^^Ly^ Dw is the orthogonal projection onto 
C7(r). An interesting consequence of this observation is that 



B. Network design for independent errors on edges 

Before going on to other measurement models, it is instruc- 
tive to consider briefly the consequences of the above results 
in the design of a synchronization or registration scheme 
for a network. Since the ML estimator ( [T3| l is unbiased for 
any graph F, the role of the number of spanning trees t(F) 
in the determinant of the estimator covariance matrix ([T6|, 
or equivalently in the determinant of the Fisher information 
matrix ( [T5| ), shows that a large number of spanning trees is 
desirable for good estimator performance. 





(a) 



(b) 



Fig. 2. Two networks with the same number of nodes and links. Network 
(a) has two spanning trees while network (b) has four spanning trees and is 
hence superior in the estimation context of this paper. 

The network depicted in Figure |2ja) has the same number 
of nodes and links as the one in Figure |2|b). But the number 
of spanning trees in the former is two while the number 
of spanning trees in the latter is four. So, under the model 
assumed in this paper, estimation fidelity will be better for 
the network of Figure |2|b). From the perspective of design, if 
one has the opportunity to add one link to the acyclic network 
shown in Figure [3] the best choice in the context of this paper 
is to create a ring network (five spanning trees) and the worst 
is to create the network of Figure |2|a). 




Fig. 3. An acyclic network. If one link can be added, the best choice is to 
create a ring network. The worst choice is to create the network shown in 
Figure [IJa). 



C. Estimation with correlated measurements 

This section further examines the situation where G = M 
and the measurement model is given by ([8]), but now with the 
measurement errors, in the standard basis for Ci(r), being 
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jointly Gaussian with covariance matrix R. In this setting, the 
probability density of the measurements is 

p(r\uj) = — exp I — i^(r — a;)_R~^(r — a;)^ 

^ ' ' V(27r)"deti? 2^ 

The maximum-likelihood estimate of a; e U{r) is obtained 
by splitting the data r as 



r = o) + £, 



(17) 



where lj E U{T) and residual e satisfies R^^e E Z{T). That 
is, 

(I) = Qt/r 

where Q^j is now an oblique projection with range U (F) and 
null space R~^{Z(T)). This situation is illustrated in Figure [4] 





Fig. 4. The residual e of tlie ML estimate has the _R satisfies the 
Kirchhoff current law. 

for a diagonal covariance matrix with entries cr|. The estimate 
uj satisfies the Kirchhoff voltage law; i.e., the oriented sum of 
(jj around any cycle as intended. It is interesting to observe 
that it is now R^^e that satisfies the Kirchhoff current law; 
i.e., the oriented sum at any vertex is zero, with the covariance 
R playing the role of resistance in a way akin to Ohm's law. 

The columns of Dj^ are a basis for [/(F) and Q implies 
that the columns of R^^Djy ai-e a basis for (i?"i(Z(F)))-'-. 
Thus, 

uj = Dl^ {DwR-^Dl,y^ DwR-^r 
The corresponding ML estimate for the vertex offsets x is 



X = {DwR~ D 



1 nT 



DwR^ r 



Motivated by these expressions, it is convenient to define the 
weighted Laplacian 



1 nT 



and similarly 



L = DR-'D 



Lw = DwR ^Dw 



By ( [l4j i, the Fisher information matrix for the vertex 
parametrization is given by 

^-E{Vl\ogp{r\u)} 

= DwR^^DJ^,' = Lw 

The Cauchy-Binet formula [9| allows the determinant of the 
Fisher information to be written as 

detF^ = det{DwR~^Djy) 

T T' 



where T and T' denote subsets of columns (edges) which are 
retained. The sum extends over all subsets of E{r) of order 
n — 1. The matrices Dwt have the property |8, p. 101] 



det DwT — 



±1, if T is a spanning tree, 
0, otherwise. 



The only terms in the above sum over T which are non-zero 
are those corresponding to spanning trees and so 



det F'^ = Vass' det([i?-i]ss') 



The quantity 



ass' = dct{DwsDws') 



(18) 



(19) 



takes values ±1 depending on the pair of spanning trees S 
and S' and independently of the choice of W. 

With the assumption R = diag(a^, ...,(T^„), ( [T8] i simplifies 

det^^-.^n^^^ 

s e^eS i 

which reduces to further to ([TSj when a'j — for all Cj E 

E{T). 

The ML estimate of the vertex offsets x is unbiased and its 
covariance matrix is 

Ci, = £{{x-x){x-x)^} = L-^ 

and has determinant 

det 



1 



Es,s"ss' det([i?-i]ss') 

from ( [T8] l. The ML estimate of u) is also unbiased and has 
covariance 

= E{(u; - - CjY} = Dl^L^Dw 
— QuR 

D. Multi-dimensional Gaussian problem 

This section further generalizes the setting to G = W^, 
where the state of each vertex in the network is a vector in 
W^. In this situation, it is necessary to consider more general 
functions of the graph F than were treated in Section |ll] The 
vertex space now consists of functions ViV) and is 

correspondingly denoted by Co(F,M'^). In fact, 

Co(r,M'') =M''(g)Co(F,M) 

In terms of the standard basis j — 1^ - ■ ■ ,d for W^, any 
element of Co(F,IR'^) can be expressed as 

d n n 



i=l j=l 



3 = 1 



Similarly, the vector space of functions E{r) R'^ is 

Ci(r,M'*) =M'^(g)Ci(F,M) 

The boundary map on Ci(F,IR'') is I (E) D where I is the 
identity map on M'* and D is the boundary map on Ci(F). 
The coboundary map on Co(F, M.'^) is I ® . 
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As in the one-dimensional case, the cycle space Z{T,M.'^) 
is defined to be the kernel of the boundary map; i.e., the set 

of 2 e Ci(r,M'') such that 

(I(g)D)z = 0. 

Writing 

n 

z = ^ 2j ® 
the cycles satisfy a vector form of (jSj. 



i.e., at any vertex, the oriented vector sum of the values of z 
on the set of edges meeting at the vertex is zero. 

The cocycle space [/(F, M'*) is the image of the coboundary 

map I (g) . For any 

n 

the equation 

must hold for for all cycles £. In this expression, aj = if 
ej is oriented as £ and aj = 1 if ej is oriented opposite to £. 

An inner product can be defined on Ci (F, M''), based on the 
inner product defined earlier on Ci (F) and the standard inner 
product on M'', namely, for Si,S2 E M'' and Xi,X2 G Ci(F), 

{si (g) a;i, si (g) xi) = (si,S2)Rd(a;i,a;2)ci(r) 
For all z e Z(F,M'^) and uj e ;7(F,M'') 

i.e., Z{T,R'^) is the orthogonal complement of C/(F,M'*). 

1 ) Independent identically distributed edge measurements: 
Recall that the measurement model is r = a; + £ with e ^ 
N{0,R), r and e in Ci(F,M''), and a; e C/(F,M^). The first 
case of interest is when the errors Sj £ M"^ are independent. 
Their individual probability densities are 

for j — I, - - ,m. By independence of the £j, the joint 
probability density for the data r E Ci(F,M'^) is thus 

p(r\uj) = — — ——^ — J- exp— -(r— u;, (R^^^I)(i — lo)) 

The maximum likelihood estimate of ui is obtained by splitting 
the data r as 

r = ci) + £, 

where Gi e f7(F,M'^) and e G Z(F,M'^). Explicitly, uj e 
U{T,M.'^) is parameterized in terms of the offsets of n — 1 
vertices with respect to a reference vertex 

UJ = {I(SD'^)x 



or alternatively, in terms of a spanning tree S, as 

a; = (I (g (20) 
where x and v are related by 

x = {I(E) D^s)u 

In terms of these parameterizations, the maximum-Ukelihood 
estimate is 

X = {R-^ (g L^){R^^ g) Dw)r 
= (I (g L^Dw)r, 

and 



The Fisher information matrix for x is given by 
F^ = -E{V^logp(r|u;)} 
= (V,a;)^(I®i?-i)V^u; 
Equation ( |20| l implies that 

V^w = 

so that 



= (g Dw^w 
= i?"^ g) 

In a similar way, the Fisher information matrix for estimation 
of u is 

= R-^ ^DlrsLwDws- 
Tht determinant of the Fisher information in both cases is 

detF^ - detF« - ^^^^^^^^ - ^(^)' 

" (det " (det ' 

which is obtained by using the tensor product identity 

dctA(gS = (det A)'^™^ (dot S)'^™^. 

Finally, the error covariance of the unbiased ML estimate 
Cj is 

= R® dJ^L^Dw ^ R®Pu, 

where Pjj is the orthogonal projection onto U{V). Taking the 
partial trace over Ci(r) yields 

Trci(r)C<i = (dim;7(F))i?. 

2) Correlated edge measurements: Now assume the errors 
on the edge measurements Sj E M'' have joint probability 
density 

where R now denotes the md x md covariance matrix of the 
m d-dimensional edge measurements. The probability density 
for the data r e Ci(r,R'^) is 

pirluj) — — , exp I — (r — u), R^^r — | 
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where the inner product in the exponent is in Ci(r,IR'*). The 
ML estimate involves a decomposition of the data as 

r = a> + £, 

where w e U{T,R'^) and e e R-^{Z{r,R'^)). ExpHcitly, 
uj = Qijr, 

where Qjj is the obHque projection with range U{T, M'') and 
null space R-'^{Z{r,R'^)); i.e.. 

The ML estimate of the vertex offsets is 

a; = ((I ® Dw)R~\l ® Dw)y^ (I ® Dw)R^^r. 
The Fisher information for x is 

In order to calculate the determinant of , it is helpful to 
first find 

dei{l®Dw)T 

where T denotes a subset of [n — l)d of the columns. Now 
(I ® Dw) has the form 











M 





Dw 



























Dw) 



Write r = (Ti,r2,--- ,rd), where Tk denotes the set of 
columns selected from the fc* block. Since F is assumed to 
be connected, Dw has rank n — 1 thus det(I (g) ZJn/)^ = 
unless exactly n — 1 columns are chosen from each block, in 
which case 

d 

det(I Dw)t ^ n det(i:)wT, ) 

{±1, if each Tj is a spanning tree 
0, otherwise, 

S = {Si, 5*2, ... , Sd) will be called a multi-spanning tree if 
it consists of one spanning tree for each dimension of the 
parameter space R"^. Applying the Cauchy-Binet formula gives 

detF^ = dct{l Dw)t <ict{[R-^]rT') det{I Djy)r 
TT' 

= ^"55' det{[R-%s') 

SS' 

(21) 

where the sum is over multi-spanning trees. In this expression, 

d 
J = l 

with ass' given by ([19]), which takes values ±L If the edge 
measurement errors are independent then pT| ) reduces to 

detF^ = ^det([i?-i]55). (22) 

5 



The error covariance of the ML estimate of the vertex offsets 
X is 

E{{x - x){x - xf} = ((I (g) Dw)R'\l «) D^,))'^ 

So, 

det E{(a; — x)(x — x^} = = } 

and 

£{{u} - Cj){u} - (bY) 

= (I ® Dlr) ((I ® Dw)R-H^ ® (I ® Dl,) 

= QuR 

IV. Local Estimators for Gaussian Noise on M 

The estimators presented thus far have either implicitly or 
explicitly assumed a fixed reference vertex or a particular 
spanning tree. This is intrinsically incompatible with the 
development of local estimators; i.e., estimators that can be 
implemented in ways that each vertex follows some procedure 
that uses only information accessible from its nearest neigh- 
bors. Consider again the situation involving Gaussian noise on 
M with the noise on edge measurements being independent, 
treated in Section |III-A| and recall that the ML estimate for 

is such that the residual {r — us) S Z{T); i.e., it obeys 
Kirchhoff's current law. Thus, o) is the solution of 

DCj = Dr. 

Indeed, if x is any element of Co(r) satisfying 
Lx = Dr 

then U3 = D^x will be the ML estimate of uj. The quantity 
x may be interpreted as the collection of offsets each vertex 
needs to apply to its own coordinate in order for the entire 
network to be aligned in a statistically optimal way. It is 
interesting to note that addition of a positive multiple of the 
orthogonal projection onto the subspace of constant functions 
on V{T) to the Laplacian matrix creates an invertible matrix. 
Hence a solution to ( |IV[ ) is obtained from 

X = (L + ^-ll'^ )-^Dr 
n 

where 1 denotes a vector of ones. Since 1^ D = 0, this is the 
same solution for all fi > Q.ln fact, it is the solution satisfying 
l^x = 0; i.e., the set of offsets obtained do not change the 
mean of the vertex states across the network. 

If the linear system ( |IV| i is solved using Jacobi's method 
| fTO| , the structure of the Laplacian matrix L ensures this al- 
gorithm is local. Jacobi's method involves writing L = N — A 
in terms of the diagonal degree matrix N and the adjacency 
matrix A and applying the recursion 

^ (^jjr + ^a,(t)^ (23) 

A fixed point of this recursion satisfies ( |40] l. Thus if the 
method converges, it gives the ML estimate. Jacobi's method 
is known to converge if the matrix L is diagonally dominant 

|L,,| >^|L,,|, (24) 



9 



although this is not a necessary condition. The Laplacian 
matrix of a graph satisfies 



Another sufficient condition for convergence of Jacobi's 
method is the so-called walk-summability condition p2) . This 
specifies that the spectral radius of N^^A is less than one; 
i.e., 

p{N'^A) < 1. (25) 

The Gershgorin circle theorem |l9) implies that p{N^^A) < 1. 

The Jacobi algorithm has been applied in MAP estimation 
using Gaussian belief propagation in Bayesian belief networks 
|[T3J. In this work, it is noted that when neither ( p4| and ( [ZS] ) 
are satisfied and Jacobi's method fails to converge, conver- 
gence can forced by using a double-loop iterative method. In 
the estimation problem of interest in this section, the possible 
violations of the sufficient conditions for convergence are as 
mild as can be found in practice. In experiments run to date, 
Jacobi's method has never failed to converge. 

Once the recursion ( |23| ) is written out in detail, the update 
for the A:* vertex becomes 



1 

n-fc 



where r(£_fc) is re if e = (vi.Vk) G ^^(r) or -re if e = 
{vk,Vi) G E{T). Thus the recursion is indeed local. Note that 
a;^*-* fc) is the current prediction at the neighboring vertex 
£ of the value at the fc* should be. At each vertex, a single 
iteration of the algorithm can be summarized as "become the 
mean of what your neighbors say your value should be." 

An alternate way to state the recursion, which will be 
important subsequent application to estimation in Lie groups 
other that M'', is as follows. Denote an action of M on itself 
by 

TrX — X + r 
Then ( |23| ) can be written as 

where 

[O, otherwise. 

Although it may be possible to develop local algorithms 
for some forms of correlation between measurement noise on 
different edges of the graph, this possibility is not explored 
here. For G = M'^, any x e Ci(r,E'^) satisfying 



{l(E)L)x = {I®D)r 



(26) 



then a; = (I (g) D^)a; will be the ML estimate of a;. Jacobi's 
method in this case give the recursion 



= (I ® N)-^ f (I (g) D)r + A)x^*^ 



(27) 



The fixed points of ( |27| i satisfy ( p6) l and thus 

\{l<E,L)u\ = Y.\{l<E>L),j\. 

and by the Gershgorin circle theorem 

p((I®iV)-i(I® A)) < 1. 
In terms of the action of M"* on itself 

Tr =x + r, 

( p7| ) becomes 

where the operator Q has a "block" form with the (fc,^)* 
block being 

[O, otherwise. 

V. Phase Alignment 

This section addresses the case of phase estimation in a 
network. For this problem, the natural parameter space at the 
vertices is the group of real numbers modulo 1, or equivalently 
the circle group comprised of complex numbers with absolute 
value one; i.e., T = {e^^^*^ : 9 e [0, 1)}. In what follows, it is 
convenient to use the latter description. 

In the preceding sections, the offsets between measurements 
at vertices naturally form elements of a vector space. In 
the current situation this will no longer be true, resulting in 
significant differences in both theory and algorithms. These 
issues are elucidated here in the context of measurements on 
the circle T and in slightly more generality in Section VI 
where the data reside in a connected abelian Lie group. The 
setting of a compact (not necessarily abelian) Lie group, which 
includes the important practical situation G = 5*0(3), will be 
addressed in a sequel to this paper. 

A. Cycles and Cocycles 

Following the treatment in Section |llj denote by Co(r,C) 
the collection of functions from the vertices ^(r) to C. 
Similarly, denote by Ci(r,C) the vector space of complex- 
valued functions on the edges £'(r). Any element of Co(r, C) 
can be written in terms of the basis functions ([TJ as 

n 

X = XjVj 
J = l 

where the Xj are now complex numbers. Similarly, the basis 
functions (|2| can be used to write any element of Ci(r, C) as 



with Xj e C. The definitions of the incidence map, etc., then 
generalize in a straightforward way to this situation. 

In this setting, the appropriate space for the measurement 
data is the collection of functions from the edges £^(r) to T. 
Motivated by analogy with the preceding cases, this will be 
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denoted by Ci(r,T). Also as earlier, Co(r,T) will denote 
the collection of functions from V(T) to T. Both Ci(r,T) 
and Co(r,T) are groups under pointwise multiplication of 
complex numbers. However, although Co(r,T) C Co(r,C) 
and Ci(r,T) C Ci(r,C), neither Co(r,T) or Ci(r,T) are 
vector spaces. In the absence of a vector space structure, in 
particular the abilities to use bases and to use inner products 
to identify spaces with their duals, it is necessary to adopt 
a modified approach to the definition of the boundary and 
coboundary operators. This begins by considering the "spaces" 
of functions Co(r,Z) C Co(r,C) from the set of vertices of 
r to the integers Z and Ci(r,Z) C Ci(r,C) from £'(r) to 
Z. An element of Co(r,Z) can be expressed in terms of the 
basis functions for Co(r, C), which are themselves elements 
of Co(r,Z), as 

n 

It is straightforward to verify that set Co(r,Z) is an abelian 
group. Similarly, an element of Ci(r,Z) can be expanded 
in terms of the basis elements Sj of Ci(r, C) with integer 
coefficients. 

In place of the inner product are "pairings" between 

Ci{T,Z) and Ci(r,T) given by 

(^,x)t,i= n <^ (2;eCi(r,z), TeCi(r,T)), 
(2,t)t,o= n <^ (2eCo(r,z), xeCo(r,T)), 

v£V{T) 

(28) 

where, in each case, t^^ = t is the complex conjugate of 

T e T. 

The boundai-y operator D : Ci(r,C) -> Co(r,C) is 
defined, as in Section |ll] on the "basis" elements by 

D{e)=t(e)-s[e) (eeCi(r,C)), (29) 

with the convention that a vertex is identified with the corre- 
sponding member of the basis of Co(r,C). When restricted 
to Ci(r,Z), the operator D maps Ci(r,Z) into Co(r,Z). 
Similarly, maps Co(r,Z) into Ci(r,Z) and the Laplacian 
L — DD^, adjacency, and degree maps all map Co(r, Z) into 

Co(r,z). 

The functions defined in ([3]), where £ is taken over all 
cycles, are functions in Ci(r,Z). The cycle space Z{T,Z) C 
Ci(r,Z) is now regarded as a set of integer combinations 
of the Z£,; i.e., it consists of integer sums "-fe^fit where 
£fc are cycles. The cycle space is the kernel of the boundary 
operator ( |29] l acting on Ci(r,Z), so Dz — for all z E 
Ci(r,Z). Note that Z(T,Z) C Z{T,C). 

With X : V{G) ^ T e Co(r,T), the T-coboundary 
operator can be defined as 

This is the dual operator of D using the pairings (•, •)T,i 
defined in (|28|; i.e.. 



for all z e Co(r,Z) and r e Ci(r,T). The T-cocycle space 
is defined to be the image of the operator D^,: 

c/(r,T) = i?,(Co(r,T))cCi(r,T). 

Members u) of U (F, T) share a property corresponding to (|6|; 



I.e. 



{z,u;) 



T,l 



n -1 

ee£;(r) 



for all z G Z{r,Z). In other words, they are "orthogonal" to 
the cycle space Z{T,Z) in the sense of the pairing defined in 
( |28| ). This is the analogue of Kirchhoff's voltage law in this 
setting: the oriented product of the elements of a; around any 
cycle is the identity element of T (i.e., 1). 

B. Distributions 

A crucial issue in the analysis of the phase alignment prob- 
lem is specification of an appropriate model for noise on T. 
Two distributions are typically considered for circular statistics 
|14|: the wrapped normal and the von Mises distributions. 
While the wrapped normal distribution is in wide use, using it 
here leads to an analysis that is essentially the same as given 
in Section III-A for Gaussian noise on E. It is appropriate and 



interesting then to consider the effects of using the von Mises 
distributed in what follows. 

Given a set of N points Zj e T, the usual definition of 
circular mean z G T is given by 



Az 



1 ^ 

-T 



The quantity A is a measure of concentration and is related 
to the circular variance p of the sample by p = \ — Al' . 

The von Mises distribution is the maximal entropy distribu- 
tion p(e*^) subject to the constraint 



Ap= I e'^p{e''>) de, 
Jt 



where p is the circular mean and 1~A^ is the circular variance. 
As such, it is the least biased distribution under this constraint. 
A random variable Z with this distribution is of the form 
e*^° Zo where Zq has circular mean one and the same circular 
variance as Z. The von Mises density function with circular 
mean u and circular variance 1 — A^ is 



p{z 



1 



where 



A 



2^/o(k) 
1 



COs(^ — /i) 



62 



{ZgZ + ZoZ*) 



(30) 



7o, Ii are respectively the first and second modified Bessel 
functions of the first kind, and Zq = e'^. The value of A 
determines k via this equation. 

For each edge e £ E{r), a measurement on e has the form 



(2;,D*t)t,o = {Dz,t)t,i, 
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where E [/*(r,T). Thus, for some 

X = {x^)v€V{r) e Ci(r,T) 
it is possible to write ujf, — Xff^^^x'^^y Hence 



or, more succinctly. 



'^e — a;t(e)£ea;s(e) 



r = ~ eD^,x. 



(31) 



C. The estimation problem 

The joint distribution of the noise e G Ci(r,T) is taken to 
be ^ 

P(^)= n O T t \ exp^(£e+£^). 

eeE(r) ^' 
Consequently, the density of r conditioned on a; is 

ee£;(r) ^ ee_E(r) 
The log-likelihood is then 

ee£;(r) ee£;(r) 

(32) 

The Fisher information for this estimation problem can be 
found by first differentiating £{r\ui) with respect to 6^ (cf. 
Q): 

— £(r|a;) = ^ yi(cj^re - cje?^) 



(Were - Were) 



e:s(e)— V 

- E y^O 

e:i(e)— f; 

= ^ HeUj^Te - ^ teCJ^r, 

e:t[e)—v e:s{e)—v 

= Im ^ Dj^eKeUj^Te 

eeE{r) 

Differentiating again with respect to 9u yields 
d^e{r\uj) 



— — Re ^ KeD^^eDve 
ei£E{r) 

The Fisher information is thus 

'd^l{r\uj)' 

= Re ^ K^DueDve^^^ire} 



F = -E 



Note that 



e6£;(r) 



SO that this reduces to 

F = Dy/FD^ 
where F is the diagonal matrix 

'Kei/l(Kei) Ke„/l(Ke„J 



F = diag 



1 1 



^o(Ke,„) 



The quantity KeIi{Ke) / Io{Ke) associated with an edge e may 
be interpreted as follows. Consider the problem of estimating 
We given the datum r^. As above, the conditional density is 

1 K 

p(re|we) = , , , — -exp-;^(w;:re+Wer^). 

2TTlo{Ke) 2 

The parametrization, We = exp{i(f>g), yields the Fisher infor- 
mation for estimation of 0e from as 
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l0gp(re|We 



Thus when the Fe are regarded as edge weights, the graph 
Fisher information is the weighted Laplacian with these 
weights. The determinant of the Fisher information is, by 
Kirchhoff's matrix tree theorem, 

S eeS 

If the edges of F share a common k, this reduces to 

kIi(k) 



detF 



■m- 



D. Maximum-likelihood estimator 

This section considers the problem of determining the 
maximum-likelihood estimator x for the vertex offsets from 
data r. Differentiation of the likelihood ( |32] i reveals that the 
critical points must satisfy 

Im ^ DyeKeUlere = 

at every vertex v S V{r). In this expression, w denotes the 
ML estimate of If the map K : Ci(r,T) ^ Ci(r,C) is 
defined by 

K{£) = ^ Ke£e 
e6B(r) 

and the residual is written as ie — (^e^e, the critical points of 
the hkelihood satisfy 



w e u{r,T) 
im{K{£)) e z(r,c) 



(33) 



and Rc {K{£)) is the corresponding value of the log-likelihood 
(up to a constant). These correspond to ( [T7] i for the Gaussian 
case. This condition can be rearranged as 



E ' 

eeE{v) 



•^e \^e' e 



(34) 



for all V £ ^(r). In other words, the weighted sum of the 
directed residuals at any vertex must be real. This condition 
may be seen as a generalization of the Kirchhoff current law 
applicable to the group T and circular statistics. The ML 
estimate for x will be among the solutions of ( |34j l. In contrast 
to the situation for M"^, there are now a number of critical 
points. In consequence, to obtain a reliable fast estimator, it 
is necessary to distinguish the global maxima from the other 
critical points. 
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^1 



lm(«:oco + Ki£i + K2£2) = 0, 




Fig. 5. Schematic of Kirchlioff laws for tlie group T. At a maximum of 
the likelihood \32) the residual e satisfies Kirchhoff 's current law in the form 



Writing 

SO that Kydy measures the strength of the connection of 
the vertex v to the rest of the network, reveals that what 
distinguishes the ML estimate from the other critical points 
of the hkehhood function is that, for moderate noise values, 

1 x-^ Kp , 



eeB(«) " 

In terms of the augmented adjacency matrix 



(35) 



{H-«,v)r{v,v') if (w, w') e £^(r) 
V!-') ifK,«)G^(r), (36) 
otherwise 

equation ([34]i can be reformulated with the aid of ( |3T| ) as 

Qx = N^^Ax = Gx, (37) 
where N,y_yi — kydy6y,i,' and G is a real diagonal matrix. 



The approximation ( (35| corresponds to k^dv « pv or G « I. 
When Ke ^ K for all e £ E{r), the matrix N is k times the 
degree matrix. 

A fast estimator for x may be obtained as follows. First, find 
the eigenvector y E corresponding to the largest eigenvalue 
of Q, 

Qy - Ay (38) 

Then Xy — yv/\yv\ for all v £ ^(r)- The structure of this 
estimator is elucidated by consideration of its local imple- 
mentation, which can be obtained by applying the power 
method |9| to ([38|. This involves application of the matrix 
Q repeatedly to a random initial vector y'-'^'. The method 
results in convergence to an eigenvector of Q corresponding 
to the eigenvalue of largest magnitude provided this is the 
only eigenvalue of largest magnitude and the initial y(°' is 
not orthogonal to a left eigenvector of Q. In practice, to avoid 
numerical overflow, the resultant vector is re-normalized after 
each application of Q. This is a critical point in turning ( |38| ), or 
any other eigenvector based estimator, into an algorithm that 
can be run locally on the network. The normalization of y is 
a global operation since it involves each vertex knowing the 
values of y across the entire network. Without the ability to 
normalize vector, it is necessary to ensure that the eigenvalue 
of largest magnitude is close enough to unity that the iteration 



converges well before an overflow or underflow condition is 
reached. The algorithm proposed here indeed has the property 
that the largest magnitude eigenvalue A is approximately one. 
Further, as demonstrated in the simulations below, it operates 
rehably to align the network without renormalization. 

To examine the local operation of this algorithm more 
closely, write yy = UyXy where a^, is a real amplitude and 
Xv G T for each v € V{T). In addition to its phase x^ S T, 
each vertex keeps a circular variance < Oy < 1 which 
measures the degree to which it agrees with its neighbors. 
The local update rule for the case — k for all e e E{T) is 

1 



^(™+i)^(™+i)^ ^„ 

Cty 



(m) ( {ni) 



(a;r^r(n..))''-<"-' (39) 



Thus, at each update, the vertex v resets to the weighted 
circular mean of what its nearest neighbors predict its phase 
should be (compare this to Jacobi's method for alignment 
in W^). The weighting is based on how well each of the 
neighbors are aligned with their own neighbors. In this way, 
the contributions of vertices that have not yet converged 
are discounted relative to the ones from vertices that have 
converged to alignment with their neighbors. More generally, 
the update takes the form 

(m+l) (m+l) _ Jl V '^(^ii^oM (r^^^)r, 
Uy Ky 

The convergence rate of the power method algorithm de- 
pends on the distance between the largest and next-to-largest 
magnitude eigenvalues. The Gershgorin circle theorem Q 
implies that 

-1 < A < 1 

The impediment to convergence comes from eigenvalues of 
Q near —1. Regularization can alleviate this to some extent. 
Consider the matrix 

= {N + I3I)-\A + I31) 
where /3 > is a regularization parameter. If y satisfies 

Qpy = Ay 

then 

Qy = \y + l3{\-l)N-^y^\y 

Thus, if the maximum magnitude eigenvalue is close to one, 
the regularization leaves it close to 1. For the regularized 
eigenvalue problem, Gershgorin's theorem implies 



-1 



< A < 1 



where Kmax is the largest element of the matrix N . Thus the 
parameter (3 can be used to bound the smallest eigenvalue 
away from —1, the effectiveness being mitigated by any vertex 
having a large value of k„. 

This estimator exhibits remarkably good performance. In 
cases tested, the estimator gives a mean circular error per- 
formance indistinguishable from that of actual maximum 
likelihood, even when k w 1. Although it seems to have 
no real bearing on practical estimation performance, it is 
possible to construct estimators that give values closer to 
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the maximum-likelihood estimate. One example of such an 
estimator, the hybrid maximum-likelihood algorithm, starts by 
running a suitable number of iterations of the local algorithm 
just described. Then it switches to the following iteration 



Jm+l) (in+l) _ 



1 



5:^a(")(4'")r.)^^0, (40) 



where 



Re((Ax('"))Jx. 



At a particular vertex, the algorithm stops when the size of the 
imaginary component of (Aa;('"')i,/a;u '' drops below some 
threshold. At this point, x satisfies the condition ( (37| ) with 
tolerance corresponding to the threshold, and thus a maximum 
of the likelihood function to this tolerance is obtained. 

The algorithms described above were tested on two sim- 
ulated networks, one with five nodes and the other with 
31 nodes. The errors on each of the edges were identically 
distributed with concentration parameter k. Two global algo- 
rithms were tested the where the eigenvectors corresponding 
to the largest eigenvalues of the matrices Q and A defined in 
( |36| l and ( [37] ) were computed. These were compared with the 
two local algorithms; i.e., the algorithm ( (39] l for arriving at an 
the eigenvector of Q corresponding to the largest eigenvalue 
and the hybrid maximum-likelihood algorithm described in 
( |40| i. These local algorithms were implemented on a network 
simulation with only nearest neighbor communication and 
with purely local stopping criteria. The results are shown in 
Figures |6] and |7j where the network structure is shown in the 
bottom left hand corner of each graph. The figures show the 
mean circular error across the network; i.e.. 



CE{x) = 1 



\V{T)-1\ 



1 — 

veV(T)/vo 

where is the true value of the node offset. In each case, 
these results are compared with the the trace of the inverse of 
the Fisher information. 



Tri. 



k,Ii{k) 

These results indicate that the local Q eigenvector estima- 
tor works as well as the global methods, gives essentially 
maximum-likelihood performance, and lies very close to the 
trace of the inverse Fisher information. 

VI. Abelian Lie Groups 

This section briefly describes the situation in which the 
measurements reside in a general connected abelian Lie group. 
Such groups are just products of groups considered in the 
previous sections; i.e., G is of the form M'^ x "f, and its 
elements may thus be written as, g — [x, z) where x E M."^ 
and z = {e'^^^^'')1=i- Its dual group, which is the group of 
homomorphisms from G to T, is identifiable with M.'^ x Z'', 
with elements t = (y, n), via the pairing 

{9,t) = {{x,z), {y,n)) = cxpi{x.y + 2'n^nk6k)- 

k=l 
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Fig. 6. Mean circular error versus concentration parameter k for tlie four 
estimators operating on tlie five-node network shown in the inset. 




Fig. 7. Mean circular error versus concentration parameter k for the four 
estimators operating on the 31-node network shown in the inset. 



As in the formulations treated previously, the edges of the 
graph r are labeled with offsets that are elements of G. Rather 
than treating this case in complete detail, the focus of what 
follows in on synopsis of the aspects of the problem where 
significant modification of the approach described for earlier 
cases is needed to treat this more general situation. 

A key issue in this setting is the choice of a suitable model 
for the distribution of the noise corrupting the edge labels. 
Little work has been done on probability distributions on 
abelian Lie groups. Even the situation of a g-torus, which 
is simply a product of circles, for q > 2 is little explored, 
although the case q = 2 is addressed in work of Singh et 
al. 1 15 1, in which a version of the von Mises distribution is 
used in connection with an application to estimation of torsion 
angles in complex molecules. Specifically, they use 

p{Oi,02) = Cexp(Ki cos(6'i - 

+ K2 cos{92 - H2) (41) 
+ A sin(6'i - /^i) sin(e'2 - ^^2)) • 

The authors have studied this topic and intend to address it 
in a later paper, where a detailed analysis of appropriate error 
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distributions on such groups will be presented. For the present 
purpose, however, it suffices to assume sufficiently smooth 
densities Pe{g) — Pe{x, z) for g = (a;, 2;) e G = M'^ x T"^ and 
e e i?(r) to allow the Fisher information to be calculated. 
Also, for this discussion, attention is restricted to the case 
where noise on distinct edges is independent. 

The methods adopted in the previous examples can be 
mimicked and generalized as follows, where now the group 
operation is written additively (so that the circle group T is 
represented as the real numbers modulo 27r). The data vector 
r is, therefore, given by 



for each e e E{r). In this expression, u) is parameterized in 
terms of the vertex offsets x. The probability density for e is 

/(£)= n -u^-)- 

ee-B(r) 

The density for the noisy measurements is 

fir\x)= Yl /e('^e - a;t(e) +a;s(e)) 

eeE{r) 

and, for given data r, the log-likelihood function is 

ee-E(G) 

where and Xe are elements of G. It remains to define an 
"edge Fisher information" as 

where involves partial derivatives in each of the real and 
circular coordinates of G. Observe that is a (d+q) x (d+q) 
matrix indexed by the coordinates of G. 

By analogy to the case of M'*, the Fisher information matrix 
for X is given by 

F^ = -E{v2logp(r|a;)} 

where is a m{d + q) x ni{d + q) matrix (m is the number 
of edges in the graph) with components 

-F{e,s),{e' ,t) = 5e,e'[-f' e]s,t 

for s, i = 1, • • • , d + Using 



'D, 



= {l(^Dw)J'{l(g)Dl,). 
In a way that closely parallels the multi-dimensional Gaussian 



case treated in Section III-D (cf. equation ([22|i), the determi- 
nant of the Fisher information in this case is 

detF'^ = ^det(J55), 

where the sum is over all multi-spanning trees S — 
{Si,S2, ■ ■ ■ ,Sd+q) consisting of one spanning tree for each 
dimension of G. 



VII. Discussion and Conclusions 

The preceding sections have introduced a statistical frame- 
work for registration and synchronization of data collected 
at the nodes of a network. The approach, which formulates 
the registration problem as one of optimally estimating true 
offsets between data at communicating nodes from noisy 
measurements of these values. Explicit estimators were derived 
and analyzed for the case in which the data reside in M."^ 
and the noise corrupting the measurements is Gaussian. These 
solutions were pointed out to have a homological character 
leading to conditions akin to Kirchhoff's laws that optimal 
estimates and their residual errors must satisfy. They were 
shown to provide insight about how network topology can 
be designed and adapted to promote accurate synchronization 
across the network. 

The phase alignment problem in which the data are on the 
circle T has also be treated explicitly and has been pointed 
out to manifest the critical properties of the more general 
case in which the data belong to a connected abelian Lie 
group. Further, iterative local algorithms, in which nodes only 
make use of information from their nearest neighbors in each 
iteration, have been described and empirically demonstrated. 

Important practical cases, including alignment of local co- 
ordinate systems, involve non-abelian Lie groups. Application 
of the approach set forth here to this class of problems will 
be treated in a sequel to this paper 
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